# Low Power and High Performance Master Slave Match Line Content Addressable Memory

Loganathan.K Research Scholar, Nandha Engineering College, Erode, India.

Prem Kumar.P

Assistant Professor, Nandha Engineering College, Erode, India.

Abstract - Content-addressable memory (CAM) is a hardware storage commonly used in the fast lookup applications. However, the parallel comparison feature costs the CAM memory large power consumption. In this paper, we propose a new CAM word architecture, called master-slave match line (MSML) design, which aims to combine the master-slave architecture and charge refill minimization technique to reduce the CAM power dissipated in the match lines (MLs). Unlike the conventional design, where only one single ML is used, our design uses one master-ML (MML) and several slave-MLs (SMLs) to perform the search operation. By sharing the MML charge with only the mismatched SML, our design can minimize the MML charge refill swing, such that the ML power consumption can be reduced effectively. Theoretically, the ML power saving is at least 50%, which is independent of the search pattern and match case. Compared with the conventional NOR-type CAM design, the simulation results show that the MSML design with the best configuration can reduce the ML energy consumption by range 7%-57%, which increases with the word size. In addition, we further propose a modified CAM cell to facilitate the MSML match performance, MSML<sub>hp</sub> design, which can even result in 28% and 69% energydelay product improvement compared with the original MSML and traditional CAM designs in the 128-bit word size case.

Index Terms – Charge refill minimization, content-addressable memory (CAM), low-power, master–slave architecture, match line (ML).

## 1. INTRODUCTION

Content addressable memory (CAM) is a storage that is addressed by the content (or data) rather than the memory address. It is widely used in many applications that require fast table lookup. Due to the frequent lookup and the parallel comparison feature where a large amount of transistors and wires are active on each lookup, the power consumption of CAM is usually considerable. In the CAM memory, the match lines and search lines are the major power consumers. The ML is long wire with large capacitance, and every search will cause a large amount of ML switching activities. Thus, the ML power consumption is very large. The MLs contribute 65%–88% to the total ternary CAM power consumption.

Traditionally, there are two ML architectures, NOR-type ML and NAND-type ML. The NOR-type ML provides the best search performance, but it costs a large ML power consumption. In contrast, the NAND-type ML trades the search performance for low-power feature. From the related work, the ML power consumption can be reduced by several methods, including the ML segmentation, pipelining search scheme, reducing the ML voltage swing, and so on. In this paper, we propose a new ML architecture, called master–slave ML. The key concept of the MSML design is to combine the master– slave architecture and charge sharing technique to reduce the CAM power dissipated in the ML switching.

The features of the MSML design are as follows:

- Unlike the conventional design, where only one single ML is used, the MSML design uses one master-ML (MML) and several slave-MLs (SMLs) to perform the search operation.
- 2) Instead of discharging the entire MML to 0, only the mismatched SMLs would draw the charge from the MML, and then be discharged. The charge loss is minimized.
- Because only refill the MML by the charge distributed to the mismatched SMLs, which is much less than the entire ML charge refill in the conventional design, the ML power consumption can be reduced effectively.
- 4) Theoretically, the MSML can reduce ML power by 50% in the worst case. In other words, 50% power saving is guaranteed, which is independent of the match case. Of course, this optimal value occurs in the CAM memory with large word size due to the MSML overhead.
- 5) In the high performance version, MSMLhp, we further modify the CAM cell to facilitate the MSML design to speed up the charge sharing process for better performance and energy-delay product.



Fig. 1. (a) Typical XOR CAM cell. (b) Conventional NORtype ML

configuration is MS<sub>4</sub>, which contains one MML and four SMLs. Compared with the conventional CAM design, the original MSML design can effectively reduce the ML power consumption, but result in a large performance penalty. In contrast, the MSML<sub>hp</sub> design trades a 15% area overhead for 20% MSML match performance improvement, such that the MSML<sub>hp</sub> design can deliver a large EDP improvement. The results show that the MSML<sub>hp</sub> design can reduce the EDP of conventional CAM design by about 69%. Besides, compared with two state-of-the-art low-power ML designs, i.e., SMA [13] and Shadow [17], the EDP improvement achieved by MSML<sub>hp</sub> are still 21% and 40%, respectively. The rest of this paper is organized as follows. Section II reviews the conventional CAM organization, and related work. In Section III, the circuitry developed for the MSML design is described in detail, and the comparisons between our design and the related work are also provided. Section IV discusses both the performance and power issues in the MSML implementation, and then the MSML<sub>hp</sub> design is proposed.

## 2. CONTENT-ADDRESSABLE MEMORY

The CAM consists mainly of the CAM cells. Fig. 1(a) shows a typical XOR CAM cell that consists of two parts: 1) one for storing data, called store unit; and 2) the other for comparing data, referred to as compare unit. The store unit is usually implemented as the traditional 6T SRAM cell that contains a cross coupled inverter pair. The compare unit is a pass-transistor logic (PTL) for comparing the stored with search data. Depending on the different applications, the NOR compare unit can be modified as XNOR logic. Besides the store and compare units, a pull-down transistor M, which is gate-controlled by the comparison result, is necessary to connect/disconnect the ML to/from the ground.

## 2.1. Conventional NOR-Type CAM

Fig. 1(b) shows the conventional NOR-type CAM design, in which the CAM cell is XOR-type, and the pull-down transistors of each CAM cell are arranged in NOR type. There are two phases in a search operation, i.e., pre charge phase and evaluation phase. During the pre-charge phase, PRE = 1 will pre charge the ML to high. Then, PRE is pulled down to 0 to start the evaluation phase. For a CAM word, if one or more cells are mismatched, the ML would be discharged to 0.

Only when all cells are matched, i.e., the search data is identical to the stored data, the ML can retain logic high as in the precharge phase. Because the pull-down path is very short, in case of a mismatch the ML is discharged to 0 quickly. Thus, the NOR-type CAM provides the best search performance. Note that the pull-down transistors arranged in NOR type is beneficial for search performance, but they contribute a lot of drain capacitances to the ML. Because in many applications most of the CAM words are mismatched, a large number of ML switching would consume a huge dynamic power. For example, in the CAM tag used in the translation look-aside buffer or cache memory, at most one word is matched on each lookup, which implies that almost all the MLs would be discharged to 0, and then be charged to high before the next search. Consequently, the NOR-type CAM is power inefficient, although it can provide the best performance.

In contrast to the NOR-type CAM, the NAND-type CAM aims to reduce the power dissipated in search operation, where the pull-down transistors of each CAM cell in the same word are arranged in NOR type. The ML is initially precharged to high, and discharged to 0 only when all CAM cells are matched. Because the load capacitance of ML is small and only a few MLs are discharged to 0 during a search, the power consumption is minimal. However, the pull-down path is too long, such that the ML discharge is very slow in case of a match. Thus, the NOR-type CAM trades the poor performance for a large power saving.

# 2.2. Related Work

Including our previous work [3], [4], there is a large amount of work on improving the power efficiency of CAM, ML power reduction especially. Zukowski and Wang introduced a selective pre charge technique to reduce the ML power consumption by breaking a CAM word into two stages. A small subset of CAM cells can be used to do a pre calculation, and then determine if the ML needs to be pre charged. The same or similar design concept was also used in the designs. A pipelined search scheme. Where a CAM word is further divided into several segments. Only the words that match a segment can proceed with the next segment search. As described above, these segmentation methods can reduce power only in the best case, where the first segment can filter out the unnecessary comparison. To overcome this drawback, a charge-shared ML scheme [15] was proposed to reduce the worst-case power consumption. Assume that the first segment is match. In the conventional design, the first segment ML1 will be discharged to 0 before the second segment search, but the charge-shared ML scheme [15] will distribute the ML1 charge to the second segment ML2, i.e., recycle the ML1 charge. Therefore, it can reduce both the best-case and worstcase power consumption.



Fig. 2. SMA proposed in [13].



Fig. 3. Shadow ML voltage-detecting design [16].

In [13], the segmented ML architecture (SMA) was proposed. As shown in Fig. 2, the SMA partitions the entire ML into four segments. These four segments are grouped into the pre charged type and the charge-shared type. First, only the pre charged segments are charged, and then the charge spread signal is enabled during the match evaluation phase. This results in the charge sharing occurred between the chargeshared segment and the pre charged segment. Because only the pre charge segments need to be charged, the ML power consumption can be effectively reduced.

Fig. 3 shows the shadow ML design [16], which is mainly composed of the level shifter (LS) and voltage detector (VD). In the evaluate phase, VD is used to charge the ML and sense the shadow ML voltage at the same time. In case of a mismatch, at least one path between the ML and shadow ML conducts. The voltage of shadow ML would be shifted by the LS first, and then to toggle the VD to disable the ML charge path. By cutting short the charge time, the ML power con-sumption can be reduced. Zhang et al. [17] proposed a new current-recycling technique to improve the power efficiency of the shadow ML design [16]. Note that the techniques described above are all NOR-type ML architecture. In contrast, the PF-CDPD ANDtype ML scheme [18] is a NAND-type ML design with lowpower feature. Based on the PF-CDPD [18], various arrangements of ML segmentation are investigated, including the tree-style [19] and butterfly-style [20]. In particular, Huang et al. [21] combined the butterfly ML with the multimode dataretention power gating, super cutoff power gating [20], and the hierarchy search-line scheme [22] to not only improve the performance, but also reduce the power consumption.

## 3. MSML DESIGN

## 3.1. Overview

The key idea behind our design is to combine the master–slave architecture with the charge refill minimization technique to reduce the ML switching power. Fig. 4 shows a MSML design example,  $MS_2$ , which consists of one MML and two SMLs. Unlike the conventional CAM design which uses a single ML, our design uses both MML and SML to perform the search operation. By sharing the charge between the MML and the SML, we can reduce the MML refill swing effectively, such that the search power dissipated in the MMLs can be largely

reduced. From Fig. 4, besides the MML and SML, an additional final-ML (FML) is used to indicate the match result. Note that the parasitic capacitance of the FML is generally smaller than that of the MML.

# 3.2. Search Operation

Similar to the conventional CAM, in our design there are two phases during a search. They are pre charge and match evaluation phases, respectively. In the pre charge phase, the MML and FML are first pre charged to high, and then in the match evaluation phase only the mismatch case will change the logic level of the FML from high to low.

Pre charge Phase: In this phase, the control signal PRE is high. Thus, the MML and FML are pre charged to high, and all SMLs.

Match Evaluation Phase: After the pre charge phase, the control signal PRE is pulled down to 0 and the search data have to be loaded on the search lines to start the matching process. This phase is called match evaluation phase.

Case 1 (Both SML1 and SML2 are Match):

This is only the match case. In this case, both the charge sharing paths S1 and S2 do not conduct. All ML logics are the same as in the pre charge phase.

Case 2 (Either SML<sub>1</sub> or SML<sub>2</sub> is Mismatch):

We first assume that  $SML_1$  is mismatch, and  $SML_2$  is match. In the  $SML_1$  segment, because at least one share transistor is turned ON to conduct the charge sharing path **S1**, the MML charge will be distributed to the  $SML_1$ . This results in a rise of the  $SML_1$  voltage, while the MML voltage level goes down. After the complete charge sharing, both the MML and  $SML_1$ will finally saturate to the same voltage, final balance voltage. According to the charge sharing equation, the final balance voltage  $V_B$  can be derived as follows:

$$V_B = \frac{C_{\rm MML}}{C_{\rm MML} + C_{\rm SML1}} V_{\rm MML} \approx \frac{2}{3} V_{\rm MML} \qquad (1)$$

Where  $C_{\text{MML}}$  and  $C_{\text{SML1}}$  are the capacitances of MML and SML<sub>1</sub>, and  $V_{\text{MML}}$  is the MML initial voltage. Because the MML capacitance is roughly two times the SML<sub>1</sub> capacitance, the result can be simplified as  $2V_{\text{MML}}/3$ . Waveform for this case, which is obtained from the HSPICE simulation using TSMC 90-nm technology with  $V_{\text{DD}} = 1$  V. the word size is 32-bit.the final balance voltage is 0.63 V, which is slightly lower than the theoretical value, 0.67 V. Then, in the following pre charge phase, the MML has to be charged to the full  $V_{\text{DD}}$ , the charge refill swing (CRS) is only 0.37 V, CRS =  $V_{\text{DD}} - V_{\text{B}}$ , which is much less than the full swing. This is the reason why the proposed design can reduce the ML power consumption. The same feature can be observed from the other assumption.



Fig. 4. MSML design configured with two SMLs.

|        | съп.     | EN II            | Path |     |   | Key Node Voltage    |                     |                     |                 |          |
|--------|----------|------------------|------|-----|---|---------------------|---------------------|---------------------|-----------------|----------|
|        | SML      | SML <sub>2</sub> | SI   | \$2 | Р | MML                 | SML <sub>1</sub>    | SML <sub>2</sub>    | FML             | Result   |
| Case 1 | match    | match            | Х    | X   | Х | VDD                 | 0                   | 0                   | V <sub>DD</sub> | match    |
|        | mismatch | match            | 0    | X   | 0 | $\frac{2}{3}V_{DD}$ | $\frac{2}{3}V_{DD}$ | 0                   | 0               | mismatch |
| Case 2 | match    | mismatch         | X    | 0   | 0 | $\frac{2}{3}V_{DD}$ | 0                   | $\frac{2}{3}V_{DD}$ | 0               | mismatel |
| Case 3 | mismatch | mismatch         | 0    | 0   | 0 | $\frac{1}{2}V_{DD}$ | $\frac{1}{2}V_{DD}$ | $\frac{1}{2}V_{DD}$ | 0               | mismatch |

Table 1 Key Node Voltage And Path Connection/Disconnection (O/X) For Each Case In The Msml Design

Case 3 (Both SML<sub>1</sub> and SML<sub>2</sub> are Mismatch):

In this case, both  $SML_1$  and  $SML_2$  segments are mismatch. Because the charge sharing path **S1** and **S2** are conducted, the MML charge will be distributed to the  $SML_1$  and  $SML_2$ 



Fig. 5. Waveform of cases 2 and 3, where CRS is the charge refill swing.

$$V_B = \frac{C_{\rm MML}}{C_{\rm MML} + C_{\rm SML1} + C_{\rm SML2}} V_{\rm MML} \approx \frac{1}{2} V_{\rm MML}.$$
 (2)

|                 | 1-miss | 2-miss | 3-miss | 4-miss | 5-miss | 6-miss | 7-miss | 8-miss |
|-----------------|--------|--------|--------|--------|--------|--------|--------|--------|
| MS <sub>1</sub> | 0.47V  | i – 5  |        |        | 1 5    |        |        |        |
| MS <sub>3</sub> | 0.63V  | 0.47V  |        |        |        |        |        |        |
| $MS_4$          | 0.78V  | 0.62V  | 0.53V  | 0.46V  |        |        |        |        |
| MS <sub>8</sub> | 0.85V  | 0.75V  | 0.67V  | 0.61V  | 0.56V  | 0.52V  | 0.48V  | 0.45V  |

Table 2 Final Balance Voltage For The 128-Bit Msml Design With Various Configurations Under Vdd = 1 V, Room Temperature 27 °C And Tt Process Corner

#### 3.3. Comparisons

Unlike the ML segmentation designs [3]-[9], which reduce ML power only in the best case, the MSML design can reduce the ML power for all cases. Its power saving is even 50% in the worst case. Therefore, 50% ML power saving is guaranteed in the MSML design. Of course, if the CAM word size is too small, the hardware overhead of MSML would diminish the power saving. The optimal value only occurs in the CAMs with large word size. This result will be shown in the following section. From the search operation of MSML design, it is clear that the MSML does not consume ML dynamic power in the match case. This is similar to the conventional NOR-type CAM design. By contrast, the SMA design [13] would even cause the ML power consumption in the match case. In addition, the SMA design [13] requires a specific sense amplifier to perform the match evaluation, which is more sensitive to the half voltage swing. Such amplifier would consume more power to diminish the power efficiency.

#### 4. MSML POWER AND PERFORMANCE ISSUES

## 4.1. MSML Power Consumption

Table II shows the final balance voltage for 128-bit MSML design with various configurations under room temperature 27 °C and TT process corner. For a given configuration, from the power saving aspect, the best case is that only one SML is mismatch, i.e., 1-miss, since its balance voltage is highest (or the CRS is smallest). On the other hand, the worst case occurs when all SML segments are mismatch. As discussed previously, the theoretical balance voltage of worst case is 0.5 V, but the real balance voltage is lower than 0.5 V.

#### 4.2. MSML Performance

In this paper, the metric used to evaluate the CAM performance is the match delay (MD), which is defined as the elapsed time from PRE = 0 to the FML discharged to 0 in case of a mismatch. Of course, the load time of search data is included in the MD. Fig. 7 shows the MD for 128-bit MSML design with various configurations. From this figure, the MD will decrease as the number of mismatched SML increases.



Fig. 6.ML power consumption for 128-bit CAM word with various MSML configurations



Fig. 7. MDs for 128-bit CAM word with various MSML configurations.

- 1) The worst case is that only one SML is mismatch, where only one pull-down transistor  $(N_x)$  is turned ON to discharge the FML. On the other hand, the best case is that all SML segments are mismatch, since the FML is discharged through all pull-down paths.
- 2) Because the charge sharing between the large MML and small SML is fast, the small SML size is beneficial to the performance. Therefore, the MSML performance can be improved by increasing the number of SML segment.
- 4.3. Modify CAM Cell for Better Performance

In the MSML design, the match evaluation process can be further decomposed into two steps: 1) the charge shared from the MML raises the mismatched SML first and 2) the SML will turn ON the pull-down transistor to discharge the FML. Compared with the conventional NOR-type CAM design, clearly, the mismatch process of the MSML design is longer. In other words, the MSML design can reduce the ML power consumption, but might degrade the search performance.

1) Because the XOR logic is a simple n MOS PTL circuit, the voltage swing of node X is constrained within  $0 \sim V_{DD} - V_{TN}$ , where  $V_{TN}$  is the threshold voltage of n MOS. The share transistor M can be turned OFF, but cannot be fully turned ON.  $V_T$  drop would reduce the turned-ON conductance of M, such that the charge sharing speed between the MML

and SML is reduced.

2) Because the bit-line (BL) (or search-line) is long wire with large parasitic capacitance, the rising speed of node *X* charged via nMOS is slow. This would further increase the SML rising time to worsen the search performance.



Fig. 8. Modify CAM cell for high search performance. (a) Conventional CAM cell. (b) Modified CAM cell.



Fig. 9. Charge sharing waveforms. (a) Conventional CAM cell. (b) Modified CAM cell.

To eliminate the drawbacks of conventional CAM cell, we further modify the CAM cell as Fig. 8(b), which can be used to improve the search performance of MSML design. From Fig. 8(b), its features are as follows:

- 1) Because the nMOS conductance is better than the pMOS conductance, the share transistor *M* is still nMOS in the modified CAM cell.
- 2) Instead of the XOR result,  $X = D \pm \overline{S}$ , the modified CAM cell uses the XNOR result, i.e., Y = D+S, and then inserts an additional inverter to drive the XNOR result to control the share transistor *M*.
- 3) The charge sharing waveform of the modified CAM cell, the inverter would generate a full swing output (Y) without  $V_T$  drop, the share transistor M can be fully turned ON. the rise time of the inverter output Y is fast that causes both the MML fall time and the SML rise time are much faster than of the conventional CAM cell.

|            |        | FF corner | 8      |        | SS corner |        |
|------------|--------|-----------|--------|--------|-----------|--------|
|            | -40°C  | 27°C      | 85°C   | -40°C  | 27°C      | 85°C   |
| $MS_1$     | 0.480V | 0.472V    | 0.463V | 0.481V | 0.476V    | 0.470V |
| $MS_2$     | 0.476V | 0.466V    | 0.449V | 0.478V | 0.473V    | 0.467V |
| $MS_4$     | 0.472V | 0.457V    | 0.432V | 0.473V | 0.467V    | 0.462V |
| $MS_8$     | 0.463V | 0.446V    | 0.415V | 0.466V | 0.458V    | 0.453V |
| $V_T(N_X)$ | 0,196V | 0.16IV    | 0.129V | 0,354V | 0.322V    | 0.292V |

Table 3 In 128-Bit Msml Design, The Wors Case  $V_b$  Analysis Under Process And Temperature Variations ( $V_{dd} = 1$  V)

This revised MSML design with modified CAM cell for high performance is denoted as  $MSML_{hp}$ . Compared with the conventional CAM cell, clearly, we trade more power and area cost for better performance. To reduce the area and power penalties incurred by the additional inverter, the inverter is actually implemented in minimum size in the  $MSML_{hp}$  design. The issue of area overhead will be discussed in Section V-D.

## 4.4. Process, Voltage, and Temperature Variations

Besides determining the range of power saving, the other critical issue to the final balance voltage is that it must be high enough to turn ON the FML nMOS transistor (NX shown in Fig. 4) in the mismatch. That ensures our design can work correctly, even in the worst case where all SML segments are mismatch and the final balance voltage is lowest. According to (2), the ideal balance voltage of worst case is 0.5 VDD, but the real value is always lower than the ideal value due to the additional hardware incurred by the MSML design. In fact, both the threshold voltage (VT) and the final balance voltage (VB) can vary significantly under different process and environmental conditions. To confirm the MSML design can work well in different conditions, we measured the VB under different process, voltage, and temperature (PVT) variations [24]. As shown in Table III, there are three temperature (-40)°C, 27 °C, and 85 °C), and five process corner (FF, FS, TT, SF, and SS) variations. Table III shows the worst-case VB value for each configuration, where the worst case is that all segments are mismatched. Because the FF and SS corners have the extreme VB values, only the FF and SS values are presented. From Table III, clearly, the worst-case VB is always larger than the VT of NX. This ensures that the MSML function is correct under all combinations of temperature and process corner. Besides, Table IV shows the lowest working voltage for MS8 design under PT variations. Because the normal supply voltage used in the 90-nm technology process is 1 V, from Table IV we conclude that the MSML design can work well within 1 V  $\pm$ 20% voltage variation.

|       | FF   | FS   | TT   | SF   | SS   |
|-------|------|------|------|------|------|
| -40°C | 0.5V | 0.6V | 0.6V | 0.7V | 0.8V |
| 27°C  | 0.4V | 0.5V | 0.6V | 0.7V | 0.7V |
| 85°C  | 0.4V | 0.5V | 0.5V | 0.6V | 0.7V |

Table 4 Lowest Working Voltage For 128-Bit Ms<sub>8</sub> Design Under Pt Variations



Fig. 10. Worst-case MD for 128-bit MS<sub>8</sub> design under PT variations.

If we use the case of 27 °C as the base, on average, the delay variation range is 87%-110% from -40 °C to 85 °C. 2) The FF and SS corners have the best and worst delay values, respectively. At a given temperature, the worst-case MD will increase with the process corner from FF to SS. If we use the case of TT corner as the base, on average, the delay variation range is 75%-162% from FF to SS corner.

### 5. EXPERIMENTAL RESULTS

In this paper, because the proposed MSML is a NOR-type design, we only focus on the comparison between it with the related NOR-type ML design rather than the NAND-type ML designs. We use TSMC 90-nm technology to implement the MSML design. Besides the conventional NOR-type CAM, we also implement the SMA and Shadow designs for comparison. They are denoted as Conv, SMA, and Shadow, respectively. Note that all simulations are performed under TT process corner at  $V_{DD} = 1$  V and temperature = 27 °C. To investigate the effect of word size on the design feature, all designs are applied to three CAM arrays that all contain 128 words, but with different word size. They are 128-bit  $\times$  32-bit, 128-bit  $\times$ 64-bit, and 128-bit  $\times$  128-bit. In particular, in the MSML design, user can configure the SML number depending on the application. In addition to the hard-ware overhead, the SML number is a powerful lever on the performance and power efficiency in our design. The reasonable SML segment numbers are 1, 2, 4, and 8. They are denoted as MS<sub>1</sub>, MS<sub>2</sub>, MS<sub>4</sub>, and MS<sub>8</sub> in the following discussion. Similar to our design, the SMA [13] is a segmentation method. In particular, there are fixed four segments in the SMA design [13]. We only implement the segmentation with the best energy efficiency for each word size. According to the results shown in [13], the optimal size of SMA Pre charged segment is 7-bit for 32-bit word size, 14-bit for 64-bit word size.



Fig. 11. Worst-case MD for various CAM designs with different word size. (a) 32-bit word size. (b) 64-bit word size. (c) 128-bit word size.

#### 5.1. Worst-Case Performance

For a fair comparison, in this paper all performance must be measured in the worst case. As defined in Section IV, we use the worst-case MD to represent the CAM performance. Fig. 11 shows the worst-case MD for various CAM designs with 32bit, 64-bit, and 128-bit word size. Clearly, due to no segmentation, the MD of the conventional NOR-type CAM is independent of SML number and fixed at 245, 298, and 391 ps for 32-bit, 64-bit and 128-bit word size. Similarly, the MDs of SMA [13] and Shadow [17] are constant and slightly better than the MD of Conv, since both the SMA and Shadow designs can reduce the ML effective capacitance and voltage.

- With an additional inverter, the MSML<sub>hp</sub> can improve the MSML performance effectively. On average, the MD improvements are 13%, 16%, and 20% for 32-bit, 64-bit, and 128-bit word size, respectively.
- 2) For a given word size, large SML segment number implies a small SML segment size. Because the charge sharing is fast between the large MML and small SML, the MSML (and MSML<sub>hp</sub>) performance increases with the SML segment number.
- 3) From the performance aspect, the MSML design is unfavorable to the CAM memory with small word size. Because in the small word size case the ML capacitance is small enough to have short MD, the two-step match process of MSML design must incur a performance overhead, especially in the 32-bit case where the MD of both MSML and MSML<sub>hp</sub> are worse than the other designs. In contrast, in the 128-bit word size case the MSML<sub>hp</sub> MS<sub>8</sub> performance is best across all designs. It is up to 24%, 9%, and 17% better than Conv, SMA(30), and Shadow.
- 5.2. Average ML Mismatch Power and Energy Consumption

Because the MSML<sub>hp</sub> design inserts an inverter to the CAM cell, this will increase the leakage power from 4.778 e-09 to 5.394 e-09 W. The leakage penalty is about 13%. For a fair comparison, the following power data is the sum of the ML dynamic power and the cell leakage power. Fig. 12 shows the average ML power consumption in the mismatch case for various CAM designs. Of course, the cell leakage power is included. It clearly depends on the ML capacitance, i.e., word size, and varies with the mismatched SML number in the MSML design. These values are obtained by averaging the power results of all mismatch cases with their probability. For example, there are four mismatch cases in MS<sub>4</sub> configuration. Assuming each SML segment has the same independent probability of mismatch, the probabilities of 1, 2, 3, and 4 SMLs mismatch are 4/15, 6/15, 4/15, and 1/15, respectively.

- In the MSML<sub>hp</sub> design, the modified CAM cell costs a power penalty for better performance. Compared with the original MSML design, the ML power overhead of MSML<sub>hp</sub> are up to 10%, 12%, and 14% for 32-bit, 64-bit, and 128-bit word size, respectively.
- As the SML number increases, the added overhead will diminish the power efficiency gained from the MSML (or MSML<sub>hp</sub>) design. This scenario is particularly obvious in the small word size cases.

 From the power efficiency aspect, both MSML and MSML<sub>hp</sub> are unfavorable to the CAM array with small word size. For example, in the 32-bit word size case.









Thus, we use the energy for a fair comparison, by definition, which is the product of the MD and the ML power. Fig. 13 shows the average ML energy consumption for various CAM designs, where all values are normalized to the ML energy of conventional CAM design.

- In all cases, compared with the original MSML design the MSML<sub>hp</sub> has better energy efficiency even though the MSML<sub>hp</sub> costs more power overhead.
- 2) In general, we can improve the MSML energy efficiency by increasing the SML number. However, due to the MS design overhead this rule is not valid in small word size. In both 64-bit and 128-bit word size.









The MSML and  $MSML_{hp}$  have the best ML energy efficiency when the SML number the case of 128-bit word size, the  $MSML_{hp}$  with  $MS_4$  configuration can reduce the ML energy consumption of Conv, SMA and Shadow by 61%,15%,and 30%, respectively.

#### 5.3. Energy Delay Product

To emphasize the search performance, we finally use the EDP metric to evaluate the various CAM designs.



Fig. 14. Normalized ML EDP results for various CAM designs with different word size. (a) 32-bit word size. (b) 64-bit word size. (c) 128-bit word size.

- Neither the MSML design nor MSML<sub>hp</sub> design is unfavorable to the CAM memory with 32-bit word size.
- For the MSML<sub>hp</sub> design, the best SML number is 4. In case of 64-bit word size, the MS<sub>4</sub> design can improve the EDP of Conv, SMA, and Shadow [17] by 50%, -7%, and 10%. In case of 128-bit word size, the

#### improvements are even 69%, 21%, and 40%.

# 5.4. Area Overhead

Clearly, compared with the conventional CAM design, the MSML design costs more interconnection wires and transistors for better energy efficiency, MSML<sub>hp</sub> especially. From above results, because our design with MS<sub>4</sub> .the size is roughly 233  $\mu$ m  $\times$  726  $\mu$ m. Compared with the conventional CAM array, whose size is 233  $\mu$ m × 617  $\mu$ m, the MSML<sub>hp</sub> design results in a 17% area overhead for 69% EDP improvement. Even compared with the original MSML design, whose size is 233  $\mu m \times 634 \mu m$ , the additional inverter used in MSML<sub>hp</sub> design costs a 15% area overhead for 20% match delay improvement. Note that the height of both the MSML and MSMLhp CAM arrays is purposely retained the same as the height of the conventional CAM array, such that all designs have the same power dissipated in the BL switching. This ensures that both MSML and MSML<sub>hp</sub> designs can only reduce the ML power without increasing the BL power.

#### 6. CONCLUSION

This paper introduces a low-power ML design, called MSML design, in which we combine the master slave architecture with the charge refill minimization technique to reduce the CAM ML power consumption. The HSPICE simulation results show that the proposed MSML design is suitable to the cases with large word size (64-bit or 128-bit) rather than the cases with small word size (32-bit). By minimizing the MML charge loss, the MSML design can largely reduce the ML energy consumption. Unlike the most related work, where the power saving depends on the occurrence of best case, in the MSML design at least 50% ML power saving is guaranteed theoretically. This feature makes the MSML design more attractive than other related work. In particular, we further propose a modified CAM cell to improve the MSML search performance by 13%-24%, even though it costs a 15% area overhead and 8%-14% power penalty compared to the original MSML design.

#### REFERENCES

- H. Noda *et al.*, "A cost-efficient high-performance dynamic TCAM with pipelined hierarchical searching and shift redundancy architecture," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 245–253, Jan. 2005.
- [2] B. Agrawal and T. Sherwood, "Ternary CAM power and delay model: Extensions and uses," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 5, pp. 554–564, May 2008.
- [3] Y.-J. Chang, "Two-layer hierarchical matching method for energyefficient CAM design," *Electron. Lett.*, vol. 43, no. 2, pp. 80–82, Jan. 2007.
- [4] Y.-J. Chang and Y.-H. Liao, "Hybrid-type CAM design for both power and performance efficiency," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 16, no. 8, pp. 965–974, Aug. 2008.
- [5] C. A. Zukowski and S.-Y. Wang, "Use of selective precharge for lowpower content-addressable memories," in *Proc. IEEE Int. Symp. Circuits Syst.*, Jun. 1997, pp. 1788–1791.
- [6] K. H. Cheng, C. H. Wei, and S. Y. Jiang, "Static divided word matching line for low-power content addressable memory design," in *Proc. Int.*

Symp. Circuits Syst., May 2004, pp. 629-632.

- [7] A. Efthymiou and J. D. Garside, "An adaptive serial-parallel CAM architecture for low-power cache blocks," in *Proc. Int. Symp. Low Power Electron. Design*, 2002, pp. 136–141.
- [8] B. D. Yang and L. S. Kim, "A low-power CAM using pulsed NAND-NOR match-line and charge-recycling search-line driver," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1736–1744, Aug. 2005.
- [9] D. S. Vijayasarathi, M. Nourani, M. J. Akhbarizadeh, and P. T. Balsara, "Ripple-precharge TCAM: A low-power solution for network search engines," in *Proc. Int. Conf. Comput. Design*, Oct. 2005, pp. 243–248.
- [10] S.-H. Yang, Y.-J. Hung, and J.-F. Li, "A low-power ternary content addressable memory with Pai-Sigma matchlines," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 10, pp. 1909–1913, Oct. 2012.
- [11] G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda, "200 MHz/200 MSPS 3.2 W at 1.5 V Vdd, 9.4 Mbits ternary CAM with new charge injection match detect circuits and bank selection scheme," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2003, pp. 387–390.
- [12] I. Arsovski, T. Chandler, and A. Sheikholeslami, "A ternary contentaddressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme," *IEEE J. Solid-State Circuits*, vol. 38, no. 1, pp. 155–158, Jan. 2003.
- [13] S. Baeg, "Low-power ternary content-addressable memory design using a segmented match line," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 6, pp. 1485–1494, Jul. 2008.
- [14] K. Pagiamtzis and A. Sheikholeslami, "A low-power content-addressable memory (CAM) using pipelined hierarchical search scheme," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1512–1519, Sep. 2004.
- [15] N. Mohan and M. Sachdev, "Low-capacitance and charge-shared match lines for low-energy high-performance TCAMs," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 2054–2060, Sep. 2007.